Physics-informed neural networks for modeling physiological time series for cuffless blood pressure estimation

The bold vision of AI-driven pervasive physiological monitoring, through the proliferation of off-the-shelf wearables that began a decade ago, has created immense opportunities to extract actionable information for precision medicine. These AI algorithms model input-output relationships of a system that, in many cases, exhibits complex nature and personalization requirements. A particular example is cuffless blood pressure estimation using wearable bioimpedance. However, these algorithms need training over significant amount of ground truth data. In the context of biomedical applications, collecting ground truth data, particularly at the personalized level is challenging, burdensome, and in some cases infeasible. Our objective is to establish physics-informed neural network (PINN) models for physiological time series data that would use minimal ground truth information to extract complex cardiovascular information. We achieve this by building Taylor’s approximation for gradually changing known cardiovascular relationships between input and output (e.g., sensor measurements to blood pressure) and incorporating this approximation into our proposed neural network training. The effectiveness of the framework is demonstrated through a case study: continuous cuffless BP estimation from time series bioimpedance data. We show that by using PINNs over the state-of-the-art time series models tested on the same datasets, we retain high correlations (systolic: 0.90, diastolic: 0.89) and low error (systolic: 1.3 ± 7.6 mmHg, diastolic: 0.6 ± 6.4 mmHg) while reducing the amount of ground truth training data on average by a factor of 15. This could be helpful in developing future AI algorithms to help interpret pervasive physiologic data using minimal amount of training data.


INTRODUCTION
AI algorithms provide unique opportunities for extracting complex actionable information from wearable physiological time series data for precision medicine 1 . These techniques, in particular machine learning (ML) and deep learning (DL) approaches, model the input-output relationships of the system, where in many cases, this system exhibits a complex nature and personalized requirements, e.g., blood pressure (BP) 2,3 , cardiac output 4 , atrial fibrillation 5 , arterial characteristics 6,7 , hemoglobin levels 8 , stress 9 , infection 10,11 estimations using wearable sensor measurements. It has been repeatedly noted that for many healthcare applications current ML-and DL-based modeling approaches require significant amounts of ground truth training data comprising subject-level information collected from a large number of patients [12][13][14][15][16] . Collecting this ground truth data at individual levels often requires the use of invasive or obtrusive medical-grade measurement systems (e.g., arterial line or cuff-based sphygmomanometer for peripheral BP monitoring), and therefore is challenging, burdensome, and in some cases infeasible. To address this gap, we need to establish high-fidelity learning models for time series data that rely on reduced amounts of ground truth data.
For many engineering or biological systems, there exists a vast body of domain knowledge that may be leveraged in training deep neural networks (DNNs) to reduce the reliance on ground truth (i.e., labeled) data. A promising direction is the construction of physicsinformed neural networks (PINNs) 17 , where the neural networks are trained to solve scientific problems leveraging underlying physics laws structured by generalizable nonlinear partial differential equations (PDEs). This is achieved by augmenting neural network training with a unique loss function that includes these PDEs in addition to the standard supervised loss. Therefore, during training, PINN weights are optimized to minimize the loss function that accounts for additional physical constraints. PINNs have proven to be highly effective in solving many complex engineering problems (e.g., fluid mechanics 18 , cyber-physical systems 19 , power systems 20 , molecular biology 21 ) with limited experimental data. However, given the inter-subject variations in the cardiovascular system, the relationships that connect wearable measurements to cardiovascular parameters are not well-defined in the form of generalized PDEs 22,23 . For example, using hand-crafted equations defined between cardiovascular parameters and wearable time series data is infeasible since; (i) the features of these wearable measurements provide a proxy for physiological parameters that are not generalizable, e.g., pulse transit time-based BP estimation requiring frequent calibration due to its dependency on personalized arterial properties that are not accessible with wearables 24 , (ii) these equations fail to track and adapt to time-dependent changes in cardiovascular dynamics e.g., increasing BP, with heart rate increasing due to increased vagal tone or decreasing due to increased sympathetic activity [25][26][27] . Therefore, there is an unmet need to establish new ways to leverage PINNs for time series data in cardiovascular applications.
In this work, we propose to establish PINNs for extracting essential cardiovascular parameters (e.g., BP) from physiological time series data with limited use of ground truth data. We achieve this by building Taylor's approximation for certain gradually changing cardiovascular phenomena, e.g., establishing the relationship between physiological features extracted from 1 Fig. 1 Physics-informed neural network (PINN). a The deep neural network (DNN) model uses input time series measurements (e.g. bioimpedance, BioZ) to estimate continuous systolic, diastolic, and pulse pressure values. Taylor's approximation is defined for physiological features extracted from BioZ, and BP is used to guide neural network training. The parameters of the approximation are calculated with DNN auto-differentiation of predictions with respect to input features. This approximation is compared with the DNN predictions to estimate values for the physics-based loss function. b The definition of indexes for the sequential segmented input bioimpedance data, x bioimpedance sensor measurements and BP. This approximation yields a Taylor approximation polynomial in the form of a PDE that includes partial derivatives (i.e., gradients) of the output with respect to the input. The values of these gradients are obtained with auto-differentiation that is inherently available in neural networks. We calculate a remainder term (i.e., residual or physicsbased loss) from the difference between Taylor's approximation and the neural network predictions and include it in the model loss function. This enables the optimization of neural network weights based on the total loss function resulting from the sum of the standard loss and the residual. The Taylor polynomial can be used to generate an approximation of the output for any input, without the use of the corresponding true labels. Therefore, the physicsbased loss can be calculated for neural network predictions for all inputs. This would lead to obtaining predictions that show minimal deviation from the approximated Taylor polynomial.
The effectiveness of the framework is demonstrated through a comprehensive case study on continuous cuffless BP estimation from wearable time series bioimpedance data. BP is a significant cardiovascular parameter frequently used by clinicians to assess cardiac and circulatory health along with their associated risk factors and disorders [28][29][30][31] . Conventional BP measurementsyielding systolic (SBP), diastolic (DBP), and pulse pressure (PP) valuesare based on oscillometric cuff inflation/deflation which causes discomfort, precluding frequent use 32 . Therefore, this case study; (i) includes a cuffless wearable BP estimation scenario that generates time series data, e.g., bioimpedance, (ii) targets a problem with clinical importance, (iii) requires the acquisition of ground truth data that is challenging to obtain, (iv) utilizes complex cardiovascular dynamics driving the translation of wearable bioimpedance into BP [33][34][35] . We focus on a single wearable modality (e.g. bioimpedance) for our time series measurements to ensure that the underlying physics remain consistent for one modality (see Methods and Supplementary Note 1 for additional details on bioimpedance). Supplementary Note 2 provides a summary of the related work on cuffless BP monitoring technologies that highlight the challenges in the domain as well as our unique contributions. Figure 1a shows a high-level representation of the proposed framework demonstrated through the selected case study. The input features for our models are extracted from non-invasive bioimpedance sensors placed at the participants' wrists or fingers in a wearable form factor (see Supplementary Fig. 1 36 . The proposed PINNs retain high accuracy (ME ± SDE, systolic: 1.3 ± 7.6 mmHg, diastolic: 0.6 ± 6.4 mmHg, pulse pressure: 2.2 ± 6.1 mmHg), while decreasing the required amount of ground truth data, on average, by a factor of 15, based on the comparison with the state-of-the-art time series regression models (see Supplementary Tables 2-4). We provided an additional proof-ofconcept study to show that the optimized PINN models demonstrate a consistent approximation of the input-output relationship for varying amounts of training ground truth data.

RESULTS
Physics-informed neural network model for cuffless BP estimation A conventional deep neural network (DNN) is trained through supervised learningalso known as inductive learning -where model weights are optimized based on a loss function that uses a labeled training dataset (i.e., the true output is known). PINN models are transductive, meaning that in contrast to inductive learning, the models make use of additional information present in the unlabeled (i.e., the true output is unknown) input data 37 . This information is integrated to the model through a modification of the conventional loss function, L conventional , which is originally calculated using the model predictions and true output labels. In contrast, the modified loss function includes an additional loss term, L physics , as shown in Eq. (1).
Here, L total represents the modified model loss function due to the addition of L physics to conventional supervised loss, and α and β indicate the weights of each loss-functions. The weight assignment is not a straightforward process 18,[38][39][40] . We selected α and β to be 1 and 10, respectively which initially results in a degree of magnitude higher loss values obtained with L conventional compared to L physics with an example shown in Supplementary  Fig. 2. The initially higher values for the supervised loss ensure that the model estimations are bounded by the true range of BP, where eventually, as the supervised loss decreases, the model weight optimization focuses on satisfying the physics-based constraint. Note that an inaccurate definition of L physics may degrade model accuracy as it may lead to misguided weight optimization. We construct L physics based on Taylor's approximation of the known physiological dynamics driving the translation of bioimpedance to BP, e.g., hemodynamic relationships defined between blood volume, arterial compliance, and BP 33,41 : let x * i : fx 1 i ; ; x N i g be the N dimensional time series bioimpedance data segmented based on the start and end of the i-th cardiac cycle and sampled to have N points for each segment, u * i : fu 1 i ; ; u M i g be the M dimensional vector corresponding to M features calculated from x * i , and Θ denote the neural network weights (Fig. 1b). The neural network generates an output, y NN based on x * , u * , and Θ, such that y NN ¼ f NN ð x * ; u * ; ΘÞ: Here, f NN is the approximated function by the neural network. See Methods for the formation of Taylor's approximation polynomial, P i ð x , represents the approximated non-stationary relationship between physiological input features and output BP. The evaluation of L physics across the complete unsupervised input set enables neural network predictions to be aware of the input-output relationships approximated with Taylor polynomial. The use of limited supervised points allows the neural network to obtain blood pressure predictions that satisfy the physical constraint defined with L physics rather than arbitrary values. To assess the effectiveness of the physics-integration in DNN training, we create an identical DNN architecture, inputs (i.e., segmented bioimpedance waveforms and physiological features), and training/testing data for both the proposed PINN and the conventionally trained DNN. The only difference is the definition of the loss functions for these models, i.e., the PINN includes the additional loss term, L physics . Figure 1c shows the DNN model architecture, that uses a combination of convolutional neural networks (CNN) and dense layers to generate a prediction of the reference ground truth information, such as BP (see Methods for details of model architecture and hyperparameters).
The bioimpedance sensors placed along the peripheral (e.g., radial, digital) arteries capture quasi-periodic waveforms. We defined three features (i.e., u * : fu 1 ; u 2 ; u 3 g) extracted at every heartbeat window from pre-processed bioimpedance signals based on their physiological relevance to hemodynamic features that exhibit certain relationships with BP. With every heartbeat, a pressure pulse propagates through the arteries causing an expansion in artery volume and an increase in pressure. The change in volume and pressure is based on the elastic wall properties of the arteries. An increase in volume with pulse arrival K. Sel et al. causes a drop in the bioimpedance amplitude due to the blood's higher conductivity than surrounding tissue 34 . The first feature, u 1 , corresponds to the level of amplitude change in the bioimpedance waveform, providing a proxy to an increase in blood volume 34 . Under varying BP, the arterial wall characteristics also affect the blood pulse wave velocity (PWV). Higher PWV results in the earlier arrival of the reflected pulse wave (e.g., due to arterial tree branching of the radial artery to the digital arteries) 42,43 . Hence, it causes an earlier secondary drop in bioimpedance amplitude when the reflected pulse reaches the arterial site. The second feature, u 2 , measures the inverse of the time difference between these two impedance drops, e.g., caused by the arrivals of the systolic and reflection waves, and gives an indirect proxy to PWV and artery elasticity 44,45 . Lastly, the third feature, u 3 ; is beatto-beat heart rate (HR), measured from the time difference between the end and the beginning of the waveform. The feature definitions are shown in Fig. 2. We share the details of bioimpedance signal pre-processing and relationships regarding arterial dynamics in the Methods section and Supplementary  Fig. 3. For each blood pressure parameter (SBP, DBP, and PP), we trained a separate model with output prediction based on the type of blood pressure provided as ground truth data during supervised training.

Study design
We use three bioimpedance-BP datasets obtained as a part of previous studies from our group: (i) graphene-HGCPT 3 ; (ii) calfree-HGCPT 46 ; (iii) ring-CPT 47 . These datasets include bioimpedance waveforms captured with a wearable form-factor sensor placed along the participants' wrist (i.e., radial) and finger (i.e., digital) arteries (see Supplementary Fig. 1). For all datasets, the reference BP values come from a medical-grade BP monitoring finger cuff device, Finapres NOVA. The details of the datasets and human subject participation are shared in the Methods section. We build personalized models (i.e., models trained and tested on the data from the same participant) for N ¼ 15 individuals (graphene-HGCPT dataset: N ¼ 6, calfree-HGCPT dataset: N ¼ 5, ring-CPT dataset: N ¼ 4). All three datasets include wide BP ranges, reaching to hypertensive scale (i.e., systolic >140 mmHg, and diastolic >90 mmHg), due to the inclusion of BP elevation maneuvers (e.g., hand gripper: HG, cold-pressor test: CPT) in the standard protocol during data collection (Supplementary Table 1).

PINN model training, testing and performance evaluation
The proposed implementation of PINNs provides unique opportunities to train with minimal ground truth data for accurate time series translation. In contrast, traditional state-of-the-art ML and DNN models built for time series regression require training on large amounts of labeled data to offer acceptable performance. We assess the PINN performance against the CNN having the same neural network architecture for estimating SBP, DBP, and PP with minimally labeled data. The minimal training criterion is defined as neural network training with a set of labeled training instances, with each instance in the set randomly selected from the uniformly distributed and segmented BP values (see Figs. 3a 30 Hz for neural network training. The peak and zero-crossing points in the derivative signal are used to mark the start and end of each cardiac cycle. ÀΔZ increases with the arrival of the blood pressure pulse wave at the artery due to an increase in the artery volume. b Characteristic points on a single cardiac cycle bioimpedance beat. The amplitude change, u 1 , provides a proxy for the amount of arterial expansion. The second feature, u 2 , corresponds to the inverse of the relative time difference between the forward traveling (i.e., systolic) wave and reflection wave giving an estimate of the pulse wave velocity (PWV). The third feature, u 3 gives beat-to-beat heart rate. We provide an additional comparison of the performance obtained with proposed PINNs with three state-of-the-art time series classical regression models: AdaBoost regressor 49 , Rocket regressor 50 , Random Forest regressor 51 , and four state-of-the-art time series deep learning models 52,53 : Long-short-term Memory (LSTM) 54 , CNN+Bi-GRU+Attention 55 , Residual Network (ResNet) 53,56 , Transformer 57 . The classical regression models are trained with; (i) minimal training criterion, (ii) 4-fold crossvalidation (~75% labeled data is used in training), (iii) 8-fold cross-validation (~88% labeled data is used in training). All deep learning models are trained with the minimal training criterion, where 20% of the training instances are used for validation purposes, such that the training weights that yield the lowest validation loss are used for obtaining BP estimations. The results of this analysis are shared in Supplementary Tables 2-4, showing that compared to the state-of-the-art time series machine learning and deep learning models, PINNs retain low RMSE and high correlations with limited training data (see Methods section for the details regarding the models). On average across all datasets, the RMSE and correlation for PINNs are 7.1 mmHg and 0.82 for SBP and 5.7 mmHg and 0.76 for DBP, respectively, outperforming all other models; the classical regression models obtain RMSE and correlation values ranging from 8.9-10.8 mmHg, 0.57-0.62 for SBP and from 6.6-8.1 mmHg, 0.57-0.64 for DBP, respectively, and the deep learning models obtain RSME and correlation values ranging from 11.8-16.5 mmHg, 0.09-0.59 for SBP and from 8.4-12.4 mmHg, 0.17-0.64 for DBP, respectively. For classical machine learning models, we observe that, as the number of data points increases (e.g. 8-folds vs. 4-folds, and 4-folds vs. minimal training), per-subject correlations fluctuate rather than improve. We believe that this is due to the underlying complexity of the input-BP relationship, where the high number of training instances may cause the classical models to refrain from making liberal predictions that is necessary to track changes in BP -and hence lead to poor correlationswhile achieving better estimates of the general trend in BPand hence lead to improved RMSE. For deep learning models, we observe similar performances obtained with CNNs, LSTMs, and Transformers, whereas CNN+Bi-GRU +Attention and ResNet models provided BP estimations with significantly higher RMSEs and poor correlations with limited training data. We believe that the significantly poorer performances with these two models when trained with limited training data are due to their high complexity incorporating either very deep CNN layers or a combination of multiple deep and complex neural network architectures.

Interpretation of Taylor polynomial parameters
The neural network learning obtains function f that approximates the complex relationship between input and output. Due to the presence of a physics-based loss in the loss function, after network optimization, this function has a minimal difference with the evaluations of approximated Taylor polynomial, P. This polynomial is constructed with the use of features extracted from input times series measurements that are physiologically relevant to BP, e.g., u 1 feature and peripheral blood volume change. Therefore, the parameters of P relate these features into complex cardiovascular parameters based on the underlying physiological mechanism. In addition, when this underlying physiological mechanism remains the same (e.g., different iterations of the cold-pressor test), these parameters are expected to demonstrate comparable behavior. However, if the input features have low correlation with the proxy hemodynamic parameter (e.g., caused by noise in measurements due to motion artifacts), the association between the parameters of P and the underlying physiological mechanism may become convoluted. Nevertheless, even in this case, PINNs retain high accuracies since the Taylor polynomials still provide a representation of non-trivial input-output mapping that guides the model predictions.
For a proof-of-concept, we select a participant (SID15) whose first input feature shows high correlations with SBP (absolute value of Pearson's correlation coefficient 0.9, see Fig. 6a, b) and conducted a post-analysis of the PINN trained over minimally provided SBP labels (e.g. 65 out of 878 ground truth labels used for training). Partial differentials (i.e., gradients) calculated with auto-differentiation are defined for SBP predictions and input features, u * . We split the testing dataset into different subsets and trials representing different CPT and recovery sessions and plotted the partial differential value distribution across varying SBP. Through this analysis, we aim to assess; (i) the general behavior of the gradient-output distribution, (ii) the changes in this distribution for different sessions, and (iii) the consistency of this distribution across multiple iterations of the same type of session. Figure 6 shows the results of this analysis, plotted separately for all three features in u * . The gradient for the first feature measures a relative change in blood pressure with a change in u 1 , which provides a proxy for arterial volume change in the peripheral arteries (e.g., digital, radial) 34 . The sympathetic stimulation caused by an external stressor (e.g., during CPT) leads to an increase in SBP 58,59 . Meanwhile, the peripheral arteries may observe vasoconstriction (i.e., narrowing of blood vessels by small muscles in their walls) 60,61 . Therefore, a negative gradient between BP and peripheral artery volume may be observed. We observe that u 1 decreases with increasing y NN , where the corresponding gradient, ∂y NN =∂u 1 has a consistent negative amplitude in agreement with the feature behavior ( Fig. 6a-c top subplots). This negative amplitude can be associated with the vessel vasoconstriction due to CPT. The change in volume/pressure is also a factor of the arterial characteristicse.g., artery compliance, that exhibit a nonlinear pressure dependent relationship 2,62,63 . We observe the magnitude of the gradient increases with increasing SBP, which can be associated with the underlying arterial wall compliance characteristics that drive the volume-pressure relationships (See Methods section and Supplementary Fig. 3).
The second-feature gradient (i.e., ∂y NN =∂u 2 ) is related to the non-linear relationship between PWV and SBP (See Methods and Supplementary Fig. 3). This hemodynamic parameter is conventionally calculated based on measurements across proximal and distal points along the arterial tree 64,65 . The physical separation of two measurement sites enables to capture of the time delay for the arrival of the BP pulse wave, i.e., pulse transit time (PTT), where PWV is d=PTT, with d being the distance between distal and proximal sensors. In our case, u 2 , only provides a proxy to PWV 44,45 , given that there is only a single-channel bioimpedance measurement used in extracting u 2 . We observe that this feature shows sudden changes in a short time frame for higher levels of SBP, dominating the changes at lower SBP levels. Therefore, the corresponding gradient result in having high magnitudes at lower SBP levels achieved during recovery sessions (green triangles in Fig. 6c subplots), and values closer to zero at higher SBP levels achieved during CPT (yellow circles in Fig. 6c subplots). Nevertheless, the decrease in magnitude with increasing SBP can be associated with the non-linear SBP-PWV relationship (see Methods and Supplementary Fig. 3).
The third feature, u 3 , measures beat-to-beat HR, where its relationship with BP exhibits dynamic nature, affected by various physiological feedback mechanisms (e.g., vagal activity, baroreceptor reflex) 25,26 . It has been previously shown that during the CPT, HR may show an initial increase (e.g., increased sympathetic activity) followed by a decrease (e.g., increased vagal tone) with increasing SBP. We observe a similar pattern in the feature distribution against SBP as shown in Fig. 6b (see mean trend line in red color). The gradient for the third feature, i.e., rate of change in SBP with changing beat-to-beat heart rate, shows positive amplitudes for lower ranges of SBP and negative amplitudes for higher ranges of SBP in agreement with the mean trend in the feature distribution. This change in polarity may be explained by the increased vagal tone dominating the sympathetic activity with increasing SBP during the CPT [25][26][27] . To further assess the consistency of the parameters establishing the Taylor polynomial across different amounts of information provided to the neural network, an additional analysis is provided. The purpose of this analysis is to demonstrate that the optimized PINN models obtain a consistent approximation of the input-output relationships defined by the Taylor polynomial under varying ground truth data used in model training. To assess this, we train a total of 137 separate models, with each model corresponding to a different number of training instances and data coming from a single individual (SID15): we create a set of initial training points, with each point randomly picked from BP data sorted and segmented into bins, where each bin has 0.5 mmHg steps. This results in 125, 77, and 69 training points for SBP, DBP, and PP, respectively. We start from a single training point and gradually increased the number of training points with two randomly picked points from the aforementioned training sets (e.g., this resulted in 63 models for SBP, labeled training instances: one to 125). Figure 7a, b and Supplementary Fig. 7a, b show the testing error for PINN and conventional CNN models based on RMSE and Pearson's correlation coefficient, as the number of labeled training instances grows for SBP, DBP, and PP, respectively. We observe that the PINN models retain consistently high performance (less than 10 mmHg RMSE and 0.85 correlation) when trained for more than ten labeled data points. Figure 7c-e, show the distribution of the gradients, ∂y NN =∂ u * , with respect to y NN , respectively, where the distributions are grouped based on the number of labeled instances used for training. We observe that ∂y NN =∂u 1 and ∂y NN =∂u 3 show a consistent distribution (less than 0.10 standard deviation in the discovered trend calculated within the group) with as low as fewer than 40 labeled training instances. Whereas the second gradient, ∂y NN =∂u 2 , demonstrates a higher deviation in discovered trend due to the highly varying distribution of the second feature, u 2 , that exhibit sudden changes during high SBP values. The results of this analysis demonstrate the potential of using a limited number of discrete ground truth data points to train the neural networks that produce robust predictions for the remaining majority of input (e.g., more than N = 800 samples) based on the unique integration of Taylor polynomial to the model training. Details of this analysis is shared in the Methods section.
Model performance under out-of-distribution and intersubject evaluation strategies We tested the PINNs against the state-of-the-art models for two additional evaluation strategies. The first evaluation strategy assesses the models' ability to perform predictions with testing data samples that have no previous examples in the training dataset. To establish such evaluation paradigm, we compose the training set using the samples at lower and higher BP ranges, while the remaining BP range is used for testing the model (see Supplementary Table 17, based on values averaged over all participants (N = 15). We observed that PINNs overperform all other learning models, achieving −1.3 ± 7.0 mmHg and −0.7 ± 5.8 mmHg estimation accuracies, corresponding to 1.5 and 1.4 times lower standard deviation of errors than the best performance obtained with the state-of-the-art-models, for SBP and DBP respectively. The higher performance with PINN is due to the physics-based constrained on the model predictions based on the approximated input-output relationships. For the second evaluation strategy, we assessed the models' ability to perform predictions in inter-subject settings (e.g., leaveone-subject-out training), where we trained the models based on the data from all subjects except one (N = 14), and tested on the excluded subject (N = 1), with four-point calibration (see Supplementary Note 3). Supplementary Figs 8-9 show the Bland-Altman and Pearson's correlation analyses results for SBP and DPB, respectively for CNNs and the state-of-the-art models directly compared to PINNs. Supplementary Table 18 provides the estimation accuracies averaged over all participants for each model. The results indicate a significant performance improvement with PINNs against the state-of-the-art models, achieving 0.2 ± 12.1 mmHg and 1.6 ± 8.7 mmHg estimation accuracies (1.3 and 1.1 times lower standard deviation of errors compared to the best performance obtained with the state-of-the-art-models), for . For example, the consistent blue tones for the top plot refers to a consistent negative value for the gradient term ∂y NN =∂u 1 , in agreement with the distribution of the u 1 feature values showing a decrease for increasing y NN predictions. c Scattered plots of the gradient values ∂y NN =∂u 1 , ∂y NN =∂u 2 , and ∂y NN =∂u 3 for varying y NN . Colors represent the specific BP elevation maneuver for each data point, such that: yellow points correspond to the values obtained during coldpressor test (CPT); green points correspond to the values obtained during the recovery session. The solid lines represent the trend line for each CPT or recovery session iteration, calculated from the average value for each integer SBP value. The observation for u 1 feature values showing a strong negative correlation with SBP agrees with the negative value for the ∂y NN =∂u 1 . The increase in magnitude for the gradient with increasing SBP can be associated with the volume-pressure dynamics driven by the arterial wall compliance. Subplots a and b for u 2 show sudden changes for high SBP values and sustained changes for the remaining times. The gradient for this feature therefore shows amplified response for certain values of SBP, as shown in subplots b and c. The gradient for the third feature, i.e., rate of change in SBP with changing beat-to-beat heart rate, shows positive values for lower ranges of SBP and negative values for higher ranges of SBP. This change in polarity may be explained due to increased vagal tone dominating over the sympathetic activity with increasing SBP during the CPT.
SBP and DBP, respectively. In these evaluation strategies the training and testing datasets are separated based on specific BP ranges or individuals. Therefore, the train and test datasets present no risk of leakage or overlap. Results show the superiority of PINNs for providing predictions in both out-of-distribution and inter-subject settings.
Model performance based on higher order Taylor's approximation used in physics-based loss function To assess the relationship between estimation accuracy and the degree of the Taylor's approximation polynomial, we conducted an analysis where we trained all PINN models with Taylor's approximation based on second-order polynomials. The results of this analysis are shared in Supplementary Tables 19-21 for SBP, DBP, and PP to provide comparison with base PINN model. We observed that the accuracy improvements are marginal with the use of a second-order Taylor's approximation polynomial (resulting in 1.02 times lower RMSE and 1.01 times higher correlation, on average, for all BP types) compared to the first-order approximation.
Ablation study on physiological features used in Taylor's approximation In order to validate the importance of physiological features in the success of physics-based loss function for the proposed PINNs, we performed an ablation study on the, physiological features. For this analysis, we replaced the three physiological features with the first three dimensions of the flatten-layer outputs during the Root-mean-squared error (RMSE) and Pearson's correlation coefficient in estimating SBP for PINN (green) and conventional CNN (orange) models trained over varying amounts of labeled instances. We observed that the PINN models retain consistently high performance (less than 10 mmHg RMSE and 0.85 correlation) when trained for more than ten labeled data points. c-e The distribution of the gradients, ∂y NN =∂u 1 , ∂y NN =∂u 2 , and ∂y NN =∂u 3 , respectively, calculated as a part of the Taylor polynomial approximated for each model and grouped based on the number of labeled instances (see subplot titles) used in model training. Gray lines show the individual model trends, whereas solid and dashed colored lines represent the mean trend and standard deviation among all models for the given range of training instances, respectively. The subplot legends (i.e., SD avg ) show standard deviation of the gradient values calculated across the group for each SBP and averaged across all points of SBP. A decrease in SD avg is observed among the models trained over higher number of instances, in agreement with the obtained prediction performances. We observe that ∂y NN =∂u 1 and ∂y NN =∂u 3 show a consistent distribution (less than 0.10 standard deviation in the discovered trend calculated within the group) with as low as fewer than 40 labeled training instances. Whereas the second gradient, ∂y NN =∂u 2 , demonstrates a higher deviation in constructed trend due to the complex behavior of the second feature, u 2 , that exhibit sudden changes during high levels of SBP (see Fig. 6a-b middle subplots).
formation of Taylor's approximation used in the calculation of the physics-based loss function for PINNs. This means, as a part of the physics-based loss, we calculated the partial derivatives of output with respect to the first three dimensions of the flatten layer output, and used the changes in these flatten layer outputs to obtain the Taylor's approximation of the next cycle output. The results of this analysis are shared in the last two columns of Supplementary Tables 19-21 for SBP, DBP and PP. We observe that replacing the physiological features with arbitrary features causes instability in model weight optimization, where for certain participants, the weights in models did not converge, and the for the remaining models, estimation accuracy degraded significantly (resulting in 1.9, 2.0, and 3.2 times higher RMSE and 1.4, 1.5, 1.8 times lower correlation, on average, for SBP, DBP, and PP, respectively). The results indicate the importance of selecting physiologically relevant features for the formation of Taylor's approximation.

DISCUSSIONS
Our contributions in this paper are summarized as follows: (i) we present a use of PINNs to extract actionable information from time series data for precision medicine with minimal use of ground truth information. Although the PINNs have already been introduced, conventionally they require well-defined generalized partial differential equations (PDEs). We propose to obtain PDEs at a personalized level by utilizing Taylor's approximations for the gradually changing known cardiovascular relationships between input and output and integrating it into neural network training. This leads to robust predictions with training on minimal ground truth data. (ii) Our proposed technique adds interpretability to the network by establishing relationships with underlying cardiovascular dynamics. We show that the parameters of Taylor's approximation remain consistent across iterations of varying BP elevation maneuvers (e.g., CPT) and show associations with known cardiovascular dynamics. (iii) We demonstrate the effectiveness of proposed PINNs through a comprehensive case study on continuous BP estimation with wearables. We test our models across different bioimpedance-BP datasets, including a total of N = 15 participants, with various kinds of sensors (see Supplementary Fig. 1) covering wide ranges of SBP, DBP, and PP. We show that PINN models retain high estimation accuracies for all individuals while decreasing the required amounts of ground truth data used in training, on average, by a factor of 15 (see Supplementary Tables 2-4). In addition, we compare the PINN performance against various state-of-the-art neural network models including a neural network model of the same architecture with PINNs trained over the same amounts of labeled data, where PINNs obtain significantly higher performance (see Supplementary Tables 2-16). One limitation of the study is the limited number of participants included in the analyses. An analysis over a larger number of individuals will have the potential to adequately represent performance variations due to interindividual variability.
It is important to note that the relationships between the physiological features and BP are non-linear. However, between consecutive cardiac cycles, we can assume that the gradients of BP with respect to the physiological features are not evolving fast given that the underlying cardiovascular characteristics are slowly varying. Our physics-based loss includes Taylor polynomials defined for consecutive beats, and therefore the use of firstorder terms in this polynomial should yield a reasonable approximation. We provide an analysis comparing the effects of adding second-order terms to the first-order Taylor's approximation polynomial on BP estimation accuracies in Supplementary  Tables 19-21, where we see a marginal improvement as expected. One opportunity that can be explored in the future is to construct Taylor approximation based on cardiac cycle pairs with higher temporal gaps (e.g., non-consecutive beats with 1 min separation). In this case, a higher-order Taylor polynomial would be more effective in achieving accurate approximation for the non-linear relationship between the physiological features and BP.
The Taylor polynomial calculated with PINNs offers an approximation of the input-output dynamics based on the boundary conditions provided with limited ground truth data. The use of Taylor polynomial approximation over hand-crafted equations (e.g., linear regression models for PTT and BP 66 ) for building PINNs address several important challenges, such as; (i) infeasibility of defining static equations that work under differing physiological contexts (e.g., exercise, stress, recovery, rest) 67 , (ii) inability to represent and mimic complex cardiovascular dynamics as it is challenging to estimate the underlying parameters for personalization. We tested our framework on datasets obtained as a part of three previous studies. These studies focus on the various implementations of bioimpedance sensors for high-fidelity capture of the physiological signals and quantify the effectiveness of their approaches through a validation study involving BP; the studies corresponding to graphene-HGCPT 3 and calfree-HGCPT 46 datasets involve features extracted from multi-channel bioimpedance waveforms used for BP estimations, while the study corresponding to the ring-CPT 47 dataset involves features extracted from a single-channel bioimpedance waveform used for BP estimations. All these datasets leverage AdaBoost regressor for the BP estimation task. To provide a fair comparison between the advancements of our proposed techniques against the previous work as well as the state-of-the-art machine learning models (AdaBoost regressor, Rocket regressor, Random Forrest regressor), we evaluated all models under same evaluation strategies.
The framework presented in this paper demonstrates a unique way to advance AI algorithms for wearable time series analysis with reduced dependency on the labeled training data. This is achieved based on the proposed implementation of PINNs that approximate input output relationships of the cardiovascular system without the need to define a closed form equation. It is important to note that, multiple challenges exist for the cuffless BP estimation framework, where there is limited evidence on the accuracy for most commercially available cuffless BP technologies [68][69][70] . Major challenges include the generation of accurate models that require minimal calibration to work for a myriad of individuals in unconstrained environments. Conventionally, this required model training over an exhaustive amount of labeled data. We show the higher performance of PINNs against conventional models in obtaining inter-subject BP estimations when only four subject-specific calibration points (first, last, min., and max. BP points) are used for model calibration. This analysis has two limitations: (1) The inter-subject models still require calibration points to be provided by end-users; and (2) although these four points can be acquired with a regular cuff, in a realworld scenario identifying the min. and max. points is not feasible. Implementation of a more realistic evaluation scenario can provide useful insights for the utility of the proposed technique for real-world deployment. This paper improves the data-intensive requirements of AI algorithms during model training and may motivate other researcher groups with limited access to ground truth information to apply proposed techniques to their framework in estimating hidden important physiological parameters from wearable sensor measurements. This work does not focus on the type of deep learning architecture that is adopted for PINN. In future work, different deep learning architectures can be tested with the addition of the proposed physics-based loss to assess the importance of the architecture to the PINN performance.

Definitions of Taylor's polynomial, physics-based and conventional loss functions used in PINN model design
Using the feature vector u * , we define a polynomial with Taylor's approximation around i-th segment as shown in Eq. (2).
P i ð x * ; u * ; ΘÞ represents this Taylor polynomial approximated based on i-th segment. ∂f NN =∂u k iÀth segment is calculated with neural network auto-differentiation for the i-th segment. The output of this polynomial can be evaluated for any given x * ; u * pair and neural network weights. Note that the bioimpedance and blood pressure data are sequential, e.g., i-th and (i+1)-th segments represent consecutive beats. We leverage this sequential nature of our data by evaluating Taylor polynomials approximated around i-th segment for input values at (i+1)-th segment, i.e., P i ð x * iþ1 ; u * iþ1 ; ΘÞ. Next, we calculate a residual resulting from the difference between the neural network prediction and the Taylor polynomial evaluated at the (i+1)-th segment as shown in Eq. (3).
Here, h i x * iþ1 ; u * iþ1 ; Θ denotes this residual value evaluated at (i+1)-th segment using Taylor's approximation around i-th segment, such that, The value of h represents a physics-based loss for the neural network. Given that h is calculated unsupervised (i.e., labels of output are not used), we can calculate h for any given input sequence. We evaluate the value of h for all consecutive input segments and use the mean squared sum of this evaluation for the physics-based loss function, as shown in Eq. (5).
where R is the total number of segments. The conventional loss function, however, only uses the labeled training points, as shown in Eq. (6).
where S is the number of labeled data instances.
Neural network design, hyperparameter selections, and model training We use identical model architecture, input-output structure, and layer hyperparameters for PINN and conventional CNN models for fair analysis. We define two separate model inputs, x * and u * , representing the zero-padded, down-sampled (sampling rate of 30 Hz) and segmented bioimpedance beats, and the features (i.e., three physiological features extracted from bioimpedance), respectively. The down-sampling to 30 Hz allows the convolutional neural networks to focus on the morphology of the signal and extract more generalized patterns as opposed to focusing on the high-resolution sequential nature of the time series data that contains redundant information due to high sampling. The segmented bioimpedance beat, x * , is connected to a two-layer 1D-CNN network (first layer number of filters: 32, kernel size: 5, activation: 'RELU', second layer number of filters: 64, kernel size: 3, activation: 'RELU'), with a max-pooling (pool size: 3, strides: 1) is applied to its output, followed by flattening. We then use a concatenation layer to combine flattened CNN outputs with u * . The concatenated layer was then connected to a series of fully connected network (FCN) layers (layer-1 number of neurons: 60, activation: 'RELU'; layer-2 number of neurons: 1) providing the model estimations.

Study description and human participation
Three datasets are used in model training and evaluation: graphene-HGCPT 3 , calfree-HGCPT 46 , ring-CPT 47 . Each dataset contains the raw time series measurements obtained with a wearable-form factor bioimpedance sensor, and the corresponding reference BP values acquired from a medical-grade finger cuff (Finapres NOVA). All the experiments with the human participants were performed under the approval of the Institutional Review Board of the Texas A&M University (IRB no. IRB2017-0086D and IRB2017-0335D), where all participants provided written informed consent to take part in the experiments. The graphene-HGCPT dataset involves N = 6 participants (1/5 female/male, age range/ median: 21-31/25) that were asked to go through multiple sessions of a BP elevation routine involving HG exercise followed directly by CPT and recovery. The participants wore bioimpedance sensors that used graphene e-tattoos placed at the participants' wrists along the radial artery. The calfree-HGCPT dataset involves N = 5 participants (all male, age range/median: 20-25/23) that were asked to go through multiple sessions of the HG and CPT protocols. The participants wore a silver electrode-based wristband at different positions. We use data collected at POS1 corresponding to the placement of electrodes aligned with the participants' radial arteries. Ring-CPT dataset involves N = 4 participants (all male, age: range/median 19-26/21) that were asked to go through multiple sessions of CPT and recovery. The bioimpedance data was collected with a ring-worn bioimpedance sensor placed at the participants' ring fingers. The overall evaluation takes place over 16 Bioimpedance signal pre-processing and feature extraction Bioimpedance modality measures deep tissue characteristics of the human body: tissue and cell compositions and their transient behavior due to physio-mechanical activities (e.g., blood flow, respiration, body fluid shifts, body fat-muscle composition changes) based on a very small, high-frequency non-invasive electrical signal injected between two contact points. The induced voltage signal obtained at additional pairs of contacts changes with the tissue composition changes and their electrical properties. When bioimpedance sensors are placed along the arteries, the acquired signal changes quasi-periodically with the change in the artery volume due to the arrival of the BP pulse wave at each heartbeat cycle. We use simultaneously acquired and synchronized bioimpedance and blood pressure (Finapres NOVA) data stream in each dataset. We segmented the data into beat-to-beat intervals based on automatically detected fiducial points. In order to isolate the blood volumetric changes that are quasiperiodic with heart rate from the raw bioimpedance time series data stream, we applied a second-order low-pass Butterworth filter with a 6 Hz cutoff frequency to the raw bioimpedance waveform. This digital filter mitigates the out-of-band noise while still allowing us to measure extreme heart rates. Next, we find the consecutive maximum slope points of the systolic wave based on the peak amplitude locations of the first inverted signal derivative. We ensure that the peak detection algorithm is selecting the most prominent peaks that are corresponding to the maximum slope points in the original signal with a plausible periodicity that is around the heart rate. After marking the maximum slope points, each zero-crossing point prior to the maximum slope point in the first derivative signal marks the location of the start of the cardiac cycle as well as the end of the previous cardiac cycle. This algorithm has been used and validated based on our previous work 71,72 . For the ring-CPT and calfree-HGCPT datasets, we use the delta bioimpedance waveform (i.e., ΔZ : change in bioimpedance in the order of 50-100 mΩ, through the removal of baseline impedance -Z 0in the order of 20-100 Ω.) normalized by Z 0 , and ΔZ with no normalization of the graphene-HGCPT dataset due to the unavailability of Z 0 baseline value.
We extract nine fiducial points from each beat-to-beat bioimpedance waveform that is sampled at 250 Hz (providing 4 ms time resolution), as shown in Fig. 2 to calculate physiological features, u ð1Þ , u ð2Þ , and u ð3Þ , where u ð1Þ ¼ ΔZ ð Þ , and u ð3Þ is 60= t J À t A ð Þ, i.e. heart rate, with ΔZ ð Þ i and t i being the delta bioimpedance amplitude and time instance of the i-th fiducial point (i.e., i 2 fA; B; ; Jg). We apply a three-beat moving average with one-beat overlap to the segmented bioimpedance waveforms and calculated physiological features and blood pressure values (i.e., systolic, diastolic, and pulse pressure). Prior to training of the model, we normalize (zero mean and unity standard deviation) all model inputs and outputs based on the complete dataset. Due to varying ranges of feature values, to avoid bias towards a specific feature during model optimizations, normalization is applied individually to each feature based on the data from all participants. Prior to the statistical analyses, the model output values are converted back to mmHg units based on the initial mean and standard deviation values used in normalization.

Model training with minimally labeled data
We designed a unique criterion to train neural network models with minimum labeled data and tested them over the complete BP range. To achieve this, we divided each participant's data into K segments, with K being the output BP range (i.e., corresponding to 1 mmHg increment between each consecutive segment), and randomly selected one output label from each segment to be included in the supervised training set. For example, in a dataset with 1000 samples and ranging systolic pressure of 120-160 mmHg (K = 40), the train set includes only 40 samples (4% of the dataset for supervised training), while the test set includes 960 samples (96% of the dataset used for testing). The PINNs and conventional CNNs were trained with minimal labeled data. To provide a fair comparison between the two models, we terminated the model training when the supervised training losses reached 0.01, where for PINNs the physics-based loss values are not included. Additionally, we observe that beyond a certain point, the conventional neural networks tend to overfit the training data, as shown in Supplementary Figs 10-12, further increasing the testing error, while the PINN model is prevented from overfitting due to the physics loss in the objective function.
Train and test of BP estimation with state-of-the-art time series classical regression models We tested three time series regression models: AdaBoost regressor 49 , Rocket regressor 50 , Random Forest regressor 51 . The models are retrieved from publicly available Python (version 3.9) libraries: for AdaBoost regressor, scikit-learn (https://scikitlearn.org/) 73 , for Rocket and Random Forest regressors, sktime (https://github.com/sktime/sktime/) 74 . To run AdaBoost and Random Forest regressors, we extracted in total 16 features from inverted bioimpedance waveforms (eight out of 16: time-based, remainder eight out of 16: amplitude-based), based on a total of nine fiducial points extracted from the bioimpedance waveform (see Fig. 2). AdaBoost regressors used ensemble of Decision Tree regressors with maximum depth of 15, and number of estimators of 100. Supplementary Fig. 13 shows the effect of different hyperparameters on the AdaBoost performance, demonstrated over an example case -SBP estimation with 4-fold crossvalidation. Random Forest regressors used an ensemble of Decision Tree regressors built on random intervals, with minimum interval width of three, and number of estimators of 100. For Rocket regressor, the raw segmented bioimpedance waveforms concatenated with extracted features were provided as inputs with the number of kernels selected as 100. For each model, three training criteria were tested: (i) minimal training criterion, (ii) 4-fold cross-validation, (iii) 8-fold cross validation. For K-fold crossvalidation analyses, the dataset is divided into K-sets of equal length, where K-1 sets are used for training, and the remaining set is used for testing. In addition, to achieve fair comparison, we included the training instances resultant from minimal training criterion, as described earlier, in the training sets defined based on K-fold cross-validation.
Train and test of BP estimation with state-of-the-art time deep learning We tested four additional time series deep learning models for benchmarking: long-short-term memory (LSTM) 54 , CNN+Bi-GRU +Attention 55 , ResNet 53,56 , and Transformer 57 . We followed the methodologies provided by each corresponding paper while building the deep learning architectures. The bidirectional LSTM model includes one-layer 1-D CNN network (number of filters: 32, kernel size: 5, activation: 'RELU'), followed by three-layer LSTM (hidden units: 64,64,32), and two-layer FCN (first layer number of neurons: 128, activation: 'RELU', second layer number of neurons: 1, activation: linear). The 3 physiological features are concatenated with the output of the third LSTM layer and then passed through the FCN layers. The CNN+Bi-GRU+Attention model architecture comprises three convolution modules (number of CNN layers in each module: 2, 2, 3). Each CNN layer has kernel size of 3, activation function of 'RELU', and padding of 'SAME''. Each CNN layer is followed by a batch normalization layer. After each module a max-pooling layer (pool size: 3) is added. The number of filters scales up by a factor of 2 (from 64 to 256) as it passes through each module. Next, a bidirectional gated recurrent unit (Bi-GRU, number of hidden nodes: 64 for forward and backward layers) and an attention mechanism that generates a context vector through computing a weighted sum of the time steps on the output of Bi-GRU are added. The attention mechanism output is concatenated with the three physiological features, followed by two-layer FCN (first layer number of neurons: 128, activation: 'RELU', second layer number of neurons: 1, activation: linear). The ResNet model consists of a one-layer 1-D CNN network (number of filters: 16, kernel size: 5, activation: 'RELU') followed by four sets of residual blocks with varying number of filters (16,32,64,128). Each set is composed of four residual blocks, with each block having a twolayer 1-D CNN network (kernel size: 3 for both layers, first activation: 'RELU', second activation: 'NONE'), followed by an element-wise addition operation with a skip connection (a 1-D CNN layer, kernel size: 1, same number of filters as the residual block). After the residual blocks, a Global Average Pooling layer is used to reduce the dimensions of the output tensor. Then a dense layer is added (number of neurons: 128, activation: 'RELU'), and the three physiological features are concatenated to the output of this layer. Finally, a two-layer FCN (first layer number of neurons: 32, activation: 'RELU', second layer number of neurons: 1, activation: linear), is used to generate the BP estimation. The Transformer model uses the encoder, with N = 2 identical transformer layers processing the input through self-attention mechanisms and FCN. Each layer has 4 parallel attention heads. The dropout rate is set to 0.1. The output of the transformer module is concatenated with the three physiological input features, followed by two-layer FCN (first layer number of neurons: 64, activation: 'RELU', second layer number of neurons: 1, activation: linear).

Model training with varying numbers of labeled training instances
To compare model performance under a growing number of labeled training instances, we created an initial training set. This set was generated based on the minimal training criterion where the labeled data is split into k bins with k ¼ BP range 2, where BP range is the difference between the maximum and minimum BP values calculated separately for SBP, DBP, and PP. This divides the dataset into different bins with bin widths equaling 0.5 mmHg. We then randomly select one point from each bin resulting in k total of labeled data points for the initial training set. We train k/2 different models for each BP output for PINN and CNN, where each model received varying numbers of labeled training points, (an increment of two in the number of labeled training points for consecutive models). For example, the first model is trained with one labeled training instance, the second model is trained with three, and the N-th model is trained with 1 + N × 2 labeled training instances. For each model, we measure the performance against the reference BP using the test set corresponding to the ground truth BP values that are not included in the training.

Hemodynamic relationships
Systolic and diastolic blood pressure values correspond to the maximum and minimum pressure points in the artery. During systole, the heart ejects blood into the aorta, which then travels through the arterial tree. Pulse pressure is the difference between systolic and diastolic blood pressure. The changes in PP and SBP are proportional to volumetric changes based on the arterial wall characteristics defined by compliance 75 , where the parameters for the equation changes per individual (see Supplementary Fig. 3).
The blood pressure pulse wave velocity (PWV) is also related to the arterial wall characteristics and its response to changing pressure. The relationship for PWV is defined as: PWV ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffi Eh=Dρ p 76 , where D is the diameter of the radial artery, h is the wall thickness of the radial artery, ρ is the average density of human blood, and E is the elastic modulus of the artery wall that has a positive correlation with blood pressure 77 .
HR and BP do not necessarily increase at the same rate due to different underlying CV control mechanisms. The relationship between HR and PP during CPT is shown to have varying correlations 25,26 . For certain individuals, reciprocal changes in cardiac autonomic regulation induce a sustained increase in HR with an increase in BP, while for others, CPT induces a decrease in HR after an initial increase, likely due to the co-activation of the vagal and sympathetic outflow at the heart level (see Supplementary Fig. 3d).

Performance metrics
To assess the trained model performance on a test dataset, we calculate per-subject and group mean error (ME), the standard deviation of the error (SDE), root-mean-squared error (RMSE), along with confidence intervals and Pearson's correlation coefficients, based on true and estimated blood pressure values. Additionally, we report the results according to the AAMI standard for BP devices 48 .

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
The graphene-HGCPT dataset analyzed during the current study is available in the PhysioNet repository, doi.org/10.13026/ce62-pc98. The calfree-HGCPT and ring-CPT datasets analyzed during the current study are available from the corresponding author on reasonable request.

CODE AVAILABILITY
Algorithms used for neural network model and physics loss function design, model training and testing are available on GitHub at https://github.com/TAMU-ESP/pinnfor-physiological-timeseries. The custom code used for data visualization are available from the corresponding authors upon request.